A Novel Approach to Improve the Mongolian Language Model Using Intermediate Characters
نویسندگان
چکیده
In Mongolian language, there is a phenomenon that many words have the same presentation form but represent different words with different codes. Since typists usually input the words according to their representation forms and cannot distinguish the codes sometimes, there are lots of coding errors occurred in Mongolian corpus. It results in statistic and retrieval very difficult on such a Mongolian corpus. To solve this problem, this paper proposed a method which merges the words with same presentation forms by Intermediate characters, then use the corpus in Intermediate characters form to build Mongolian language model. Experimental result shows that the proposed method can reduce the perplexity and the word error rate for the 3-gram language model by 41% and 30% respectively when comparing model trained on the corpus without processing. The proposed approach significantly improves the performance of Mongolian language model and greatly enhances the accuracy of Mongolian speech recognition.
منابع مشابه
مقایسه روش های طیفی برای شناسایی زبان گفتاری
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...
متن کاملLanguage Model for Cyrillic Mongolian to Traditional Mongolian Conversion
Traditional Mongolian and Cyrillic Mongolian are both Mongolian languages that are respectively used in china and Mongolia. With similar oral pronunciation, their writing forms are totally different. A large part of Cyrillic Mongolian words have more than one corresponds in Traditional Mongolian. This makes the conversion from Cyrillic Mongolian to Traditional Mongolian a hard problem. To overc...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملMulti-period and Multi-objective Stock Selection Optimization Model Based on Fuzzy Interval Approach
The optimization of investment portfolios is the most important topic in financial decision making, and many relevant models can be found in the literature. According to importance of portfolio optimization in this paper, deals with novel solution approaches to solve new developed portfolio optimization model. Contrary to previous work, the uncertainty of future retur...
متن کاملDevelopment and Validation of Teacher Emotional Support Scale: a structural equation modeling approach
Reviewing the literature indicated that no validated model was found that examine the extent to which teachers support their students emotionally in EFL classrooms. Therefore the present study elaborated on this issue through developing and validating a teacher emotional support scale in an Iranian English foreign language context. Main components of the scale have been specified based on Hamre...
متن کامل